Acquiring Hyponymy Relations from Web Documents
نویسندگان
چکیده
This paper describes an automatic method for acquiring hyponymy relations from HTML documents on the WWW. Hyponymy relations can play a crucial role in various natural language processing systems. Most existing acquisition methods for hyponymy relations rely on particular linguistic patterns, such as “NP such as NP”. Our method, however, does not use such linguistic patterns, and we expect that our procedure can be applied to a wide range of expressions for which existing methods cannot be used. Our acquisition algorithm uses clues such as itemization or listing in HTML documents and statistical measures such as document frequencies and verb-noun co-occurrences.
منابع مشابه
Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures
This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of...
متن کاملExtracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents
This paper describes a method to acquire hyponyms for given hypernyms from HTML documents on the WWW. We assume that a heading (or explanation) of an itemization (or listing) in an HTML document is likely to contain a hypernym of the items in the itemization, and we try to acquire hyponymy relations based on this assumption. Our method is obtained by extending Shinzato’s method (Shinzato and To...
متن کاملHyponym Extraction from the Web based on Property Inheritance of Text and Image Features
Concept hierarchy knowledge, such as hyponymy and meronymy, is very important for various Natural Language Processing systems. While WordNet and Wikipedia are being manually constructed and maintained as lexical ontologies, many researchers have tackled how to extract concept hierarchies from very large corpora of text documents such as the Web not manually but automatically. However, their met...
متن کاملBoosting Precision and Recall of Hyponymy Relation Acquisition from Hierarchical Layouts in Wikipedia
This paper proposes an extension of Sumida and Torisawa’s method of acquiring hyponymy relations from hierachical layouts in Wikipedia (Sumida and Torisawa, 2008). We extract hyponymy relation candidates (HRCs) from the hierachical layouts in Wikipedia by regarding all subordinate items of an item x in the hierachical layouts as x’s hyponym candidates, while Sumida and Torisawa (2008) extracted...
متن کاملDiscovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM
The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clusterin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004